Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a preview version of the de-novo pipeline #641

Merged
merged 19 commits into from
May 2, 2024

Conversation

VJalili
Copy link
Member

@VJalili VJalili commented Feb 8, 2024

This PR adds a preview version of the de-novo pipeline, initially implemented by @alsanju, @nicolecalamari, and Steph Hao. This PR keeps the original functionality intact and makes the minimally required changes listed below.

  • Add a Dockerfile, which was created by taking a brute-force approach trying to make an image as the experimental image used during the initial development phase that was created without a Dockerfile.
  • Consistently add set -exuo pipefail to all the tasks.
    • Fix bugs previously ignored/silenced.
    • Replace all head and tail commands with their awk equivalent since head/tail fail pipefail.
    • Add || true to the grep calls to address cases when it fails to find a pattern in its input.
  • I rewrote the tabix_query method because of issues such as “Popen” yielding by char instead of a line and errors such as “device is busy.”
  • Replace deprecated syntax with their current latest equivalent (e.g., ${sep=" " input} to ~{sep=" " input}, replacing $ with ~).
  • Replace all python3.x (e.g., python 3.6 and 3.9) calls with python, hence consistently using the version of python that comes with the Docker image (i.e., python3.10), and implement the minimally necessary changes in the python scripts of the pipeline.
  • Update for consistency in setting up runtime attributes and, for instance, changing templates such as "~{select_first ...} GB" to select_first ... + " GB".
  • Refactor variables named with reserved words (e.g., variables called list or file are all refactored).
  • Refactor list/tuple-like returns in Python scripts that are not lists or tuples (e.g., refactor return (a) to return a).
  • Remove unnecessary parenthesis from if conditions and for loops (e.g., refactor if (a == 10) to if a==10).
  • Anti-pattern comparisons are replaced with their correct equivalent (e.g., refactor a == None with a is None).
  • Simplify asserts or loops when possible (e.g., change if a == True to if a).
  • Improvements in syntax; refactoring all variables, classes, workflows, methods, tasks, and file names to adhere to the same conventions as those used in the GATK-SV pipeline.
  • Match runtime struct naming with the convention used in the GATK-SV pipeline;
  • Improvements in style, such as consistent use of indentation and line breaks.
  • Remove commented-out, temporary, or unused code.
  • Update all the scripts and WDLs to adhere to PEP8, miniwdl, and womtool requirements.

Improvements to be implemented in the follow-up PRs:

  • Currently, the scripts lack any error handling, making debugging challenging. Hence, the scripts must be updated with minimal error/exception handling.
  • In Python scripts, many variable names shadow outer scope variables, which may lead to unwanted overrides. Such variables need to be refactored.
  • Python scripts contain confusing and anti-pattern use of NA; for instance, dataframe columns are set to "NA" that may not function as intended when asserting for np.nan in that column.
  • Python scripts contain methods that not all their execution path returns a non-void value; such methods need to be updated to provide valid returns for all their execution path.

Testing

Successfully tested on the methods' team Cromwell server: 60d3c93f-7c14-420c-9b3b-381ee4d6e789

@VJalili VJalili marked this pull request as ready for review February 15, 2024 13:48
@VJalili VJalili requested a review from mwalker174 February 15, 2024 13:48
Copy link
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @VJalili for getting this in and your patience with my response. A lot of detective work you must've done to get the docker working - well done!

I have some comments about the WDLs to clean them up a bit - there are quite a few places where we can re-use code.

@VJalili
Copy link
Member Author

VJalili commented May 2, 2024

Thank you, @mwalker174!

@VJalili VJalili merged commit 76060cf into broadinstitute:main May 2, 2024
5 checks passed
@VJalili VJalili deleted the denovo branch May 2, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants